NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

X-Blossom: Massive Parallelization of Graph Maximum Matching

https://doi.org/10.14778/3748191.3748199

Fan, Dayi; Lee, Rubao; Zhang, Xiaodong (June 2025, Proceedings of the VLDB Endowment)

The blossom algorithm computes maximum matchings in graphs and has been widely applied across diverse domains, including machine learning, economic analysis, and other essential data analytics applications. As data scales and the demand for real-time processing intensifies, high-performance computing solutions have become indispensable. Over the years, substantial research efforts have been dedicated to improving the sequential blossom algorithm. However, developing an efficient parallel solution remains highly challenging due to the algorithm's intricate execution patterns, sequential recursive dependencies, dynamic data structure modifications, and inefficient path search. By thoroughly analyzing existing solutions, we have identified critical issues and proposed a new parallel framework called X-Blossom. This framework eliminates recursion entirely, enables efficient searches for multiple disjoint paths, and employs a simple path table to trace paths, removing the need for dynamic graphs and trees. These efforts in algorithm development result in significant performance enhancement. Extensive experiments on real-world datasets show that X-Blossom outperforms all existing solutions, achieving up to 992x speedup compared to the fastest sequential baseline, and an average of 431x speedup over the state-of-the-art parallel solution using 8 cores. It also demonstrates excellent scalability, achieving an average speedup of 1.72x when threads double in scalability tests to 64 cores. To the best of our knowledge, X-Blossom is the fastest solution for this class of graph algorithms.
more » « less
Free, publicly-accessible full text available June 1, 2026
X-TED: Massive Parallelization of Tree Edit Distance

https://doi.org/10.14778/3654621.3654634

Fan, Dayi; Lee, Rubao; Zhang, Xiaodong (March 2024, Proceedings of the VLDB Endowment)

The tree edit distance (TED) has been found in a wide spectrum of applications in artificial intelligence, bioinformatics, and other areas, which serves as a metric to quantify the dissimilarity between two trees. As applications continue to scale in data size, with a growing demand for fast response time, TED has become even more increasingly data- and computing-intensive. Over the years, researchers have made dedicated efforts to improve sequential TED algorithms by reducing their high complexity. However, achieving efficient parallel TED computation in both algorithm and implementation is challenging due to its dynamic programming nature involving non-trivial issues of data dependency, runtime execution pattern changes, and optimal utilization of limited parallel resources. Having comprehensively investigated the bottlenecks in the existing parallel TED algorithms, we develop a massive parallel computation framework for TED and its implementation on GPU, which is called X-TED. For a given TED computation, X-TED applies a fast preprocessing algorithm to identify dependency relationships among millions of dynamic programming tables. Subsequently, it adopts a dynamic parallel strategy to handle various processing stages, aiming to best utilize GPU cores and the limited device memory in an adaptive and automatic way. Our intensive experimental results demonstrate that X-TED surpasses all existing solutions, achieving up to 42x speedup over the state-of-the-art sequential AP-TED, and outperforming the existing multicore parallel MC-TED by an average speedup of 31x.
more » « less
Full Text Available
RR-Compound: RDMA-Fused gRPC for Low Latency, High Throughput, and Easy Interface

https://doi.org/10.1109/TPDS.2024.3404394

Geng, Liang; Wang, Hao; Meng, Jingsong; Fan, Dayi; Ben-Romdhane, Sami; Pichumani, Hari Kadayam; Phegade, Vinay; Zhang, Xiaodong (August 2024, IEEE Transactions on Parallel and Distributed Systems)

We have developed an open-source software called RR-Compound for low latency, high throughput, and easy interface for users.
more » « less
Full Text Available

Search for: All records